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Identifiers-Seattle Performance Appraisal Guide 

This Study is a folowup conducted 1 year after the 1966 study (ED 001 162) 
which attempted to assess the effects of reduced Io 2 kJs and inservice help on the 
OTSsroom behavior of 120 beginning teachers. A conclusion of the original study was 
that the experimental group showed at least 257 higher scores on teaching 
p^formances than the control group; the folowup study was designed to determine if 
tl^elative differences in teadvng behavior persisted. Subjects were 10 randomly 
select^ members from each of the original 4 experimental groups. Four observers. ^ 
of whi^ participated in the original study, were trained by Harry L. Garrison, who had 
trained the original observers in the use of the same instrument. Each team of 2 
observers appraised half of the 40 subjects, and the reliability of observers* ratings 
was assessed statisticaly. There were no significant differences when analysis of 
variance was computed among the 4 groups for each of 10 variables . Results 
indicated that differences among the 4 groups tended to become smaler after a 
ye^; however, most of the variability was accounted for by the atypical scores of 5 
sublets. Judging by means of observer ratings, the total group of 40 tended to show 

in teaching performance standards. Included are 3 statistical 

^ reactions of Garrison who designed the observation instruments used. 
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FOLLOW-UF STUDY : LONG-TERM EFFECTS OF MODIFIED 

INTERNSHIP FOR BEGINNING ELEMENTARY TEACHERS 



In 1966 Professor Herbert Hite, College of Education, Washington State University 
published an interesting report on an experimental project in teacher training, 
"Effects of Reduced Loads on Intensive Inservice Training Upon the Classroom 
Behavior of Beginning Elementary Teachers." 

The Staff and Trustees of SIRS were pleased to help, financially, in a folloxf-up 
^ study which Professor Hite organized and administered. And we are happy to 

\ publish Professor Hite's report on this second study. 

Since Dr. Harry Garrison aided Professor Hite in the project by designing instru- 
ments by which performance was appraised, we asked him to read the manuscript and 
react. Dr. Garrison's background includes serving the Seattle Public Schools as 
Personnel Assistant, Evaluation, and as Coordinator of Student Teaching, Western 
Washington State College. 

We have included Dr. Garrison's letter for its thought -provoking values. 



Morton A. Johnson 
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FOLLOW-UP STUUY: LONG-TERM EFFECTS OF MODIFIED 

INTERNSHIP FOR BEGINNING ELEMENTARY TEACHERS 

by Herbert Hite» Professor 
Washington State University 



INTRODUCTION 

In this follow-up study, the teaching of 40 second-year, elementary teachers was 
appraised to determine if the effects of special treatments In the previous year 
would still be apparent. The 40 teachers were part of a group of approximately 120 
elementary beginning teachers who were the subjects of an experiment to assess 
effects of reduced loads and Inservlce help on the classroom behavior of beginning 
teachers.^ 

In the original study. In 1965-66, the beginning elementary teachers were first 
matched and then assigned to four different groups. Group I was released of 
25 per cent of classroom time for preparations and to confer with a supervisor who 
observed the teacher’s performance during the week. Group II was released from 
25 per cent of classroom teaching time and used this time partly to visit experi- 
enced teachers with similar assignments. Group III teachers were assigned about 
25 per cent fewer pupils than the average of the classrooms In that school district. 
Group IV teachers were a control group and received no special treatment of released 
time or special Inservlce help. A conclusion of the original study was that the 
experimental group showed at least 25 per cent higher scores on teaching performances 
than the control group. The purpose of the follow-up study was to determine If the 
relative differences In teaching behavior persisted after a period of one year. 

PROCEDURES OF THE FOLLOW-UP STUDY 

The project staff corresponded with the elementary schools who participated In the 
first study to locate all of the original group of subjects who were still teaching 
In their original assignments. The staff then randomly selected 12 mendbers of each 
of the original four experimental groups. They sent each of these 48 selected 
subjects a post-card questionnaire Inviting them to participate in the follow-up 
study. Enough returns were received to form four groups of ten members each. 

Four observers were experienced elementary teachers. Two had served as observers 
in the original study; the other two were substitute teachers in one of the 
participating school districts. The observers learned to use the Seattle Per- 
formance Appraisal Guide, the same Instrument used in the original study, under the 
direction of Dr. Harry L. Garrison. Dr. Garrison, who also trained the observers 
In the first study, had the four study the Appraisal Guide and practice using It to 
Judge video-taped performances of experienced teachers. The observers could In 
this way compare their appraisals with each others' and examine the taped per- 
formances repeatedly. After this training experience, the four practiced by 
visiting and appraising cadet teachers. 



Herbert Hite. "Effects of Reduced Loads and Intensive Inservlce Training U^on 
the Classroom Behavior of Beginning Elementary Teachers." Coop Research Project 
#2973 U. S. Office of Education, Washington State Superintendent of Public 
Instruction and WSU, 1966. 
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For the purpoee of this followup study, a team of two observers was to appraise 
each of the 40 std>Jects, ten members from each original group, one time. The team 
of observers arrived at a prearranged time at the subject's classroom* They were 
seated In the rear of the room, and Independently observed and rated the teacher on 
ten different teaching behaviors* Each team of two observers rated one half of the 
subjects* 

The observers' ratings were then analyzed by computer In the office of the State 
Superintendent of Public Instruction* 

RESULTS 

The first step In analyzing the data was to assess the reliability of the observers' 
ratings* Table 1 shows correlation coefficients of each pair of observers for each 
of the ten performances of the teachers which they rated* In order to compare 
observer scores, Pearson proatict moment coefficients, which were obtained for each 
pair of observers, were converted to Fisher Z scores* As shown In Table 1, one 
pair of observers agreed more closely than the other pair* The degree of observer 
agreement was sufficiently high In the Judgment of the project staff to warrant 
further treatment of the observer data* 

TABLE 1 

OBSERVER CORRELATIONS^ BY TEAM FOR EACH TEACHING BEHAVIOR 



Observer 

Team 


Behaviors 




123456789 


10 


Overall 


1 


.95 .92 .84 .88 .89 .86 .88 .89 .78 


*86 


*88 


2 


.71 .74 .55 .64 .55 .61 .42 .50 .39 


*28 


*55 



a 

Fisher Z scores: Henry F* Garrett, "Statistics In Psychology and Education," 

New York: David McKay Co*, Inc*, 1962, pages 132, 172* 



The data for analyzing changes In ten teacher performinces on the ^art of the 40 
8«d>Jects Is in Table 2* The mean ratings for each observer team on the occasion 
of the last appraisal of the original study appear as scores under "Round 4" in the 
Table* This mean observer rating Is of the scores of only the ten teachers who 
were meinbers of each group In the followup study* The mean ratings for each 
observer team for each group appears as scores under "Round 5" In the Table* Ndxt 
to these two scores for each group for each behavior. Is the difference between 
the tMo mean ratings* The Table, then, shows the difference between appraisals of 
the four groups of teachers at the end of the original study and appraisals made 
slightly over one year later* 

Analyses of variance were computed for the differences among the four groups of 
subjects for each variable* There were no significant differences* 
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DISCUSSION 

1. Because only ten members of each of the original four groups were members of 
the four groups in the follow-up study, the mean ratings for Round 4 were not those 
of the original groups. The ten members of Groups I, III and IV had lower mean 
ratings than their respective original groups; Group II in the follow-up study had 
mean ratings that were higher on Round 4 than their original total groups Thus, 
the relative rank of the four groups on Round 4 was different from the original 
ranking in which Group IV was considerably lower than the other three, and Group 
III was clearly highest of all. 

2. The differences among the four treatment groups tended to become smaller after 
a year. The highest scores on Round 4 (Group II) tended to move in a negative 
direction; the lowest scores on Round 4 (Group IV) tended to move in a positive 
direction. This general observation, however, is modified by the following 
observation. 

3. Most of the variability among the groups on Round 5 was accounted for by the 
scores of five individuals. These five teachers were rated differently on R oi yn *^ 5 
compared to Round 4 by at least three points on a seven-point scale. (See Table 3) 
This amount of change over the year period was not typical of the general group 

^ follow-up study the greatest gains were made apparently by Group IV, 
while the negative-direction changes were scored for Group II. If the scores of 
three individuals are not used, however, the two groups show almost the same amount 
of difference between Rounds 4 end 5**«13 end «08 respectively# 

The dramatic changes in observer ratings of five individual teachers can not be 
attributed to the original experimental treatments, because the five were members 
four groups--two members from Group IV were in the group of five. Possibly 
there were real changes in teaching behavior of this magnitude. There is also the 
possibility that the observer teams saw an atypical teaching performance on the 
day they appraised these five individuals. There is also the possibility of a 
systematic bias on the part of the observers. At any rate, the staff feels that 
the changes in observer ratings for these five teachers can not be considered an 
outcome of the special treatments, or of the lapse in time. 

4. The general tendency for the total group of 40 teachers was to show small 
positive gains in teaching performance standards, judging by the rather slim 
evidence of u:ean observer ratings. Twenty- four of the 40 made positive gains. 

5. The greatest gains of the total group of 40 teachers were on Variable 3, 

Use of Resources, Variable 5, Organizing the Class, and Variable 8, Student Par- 
ticipation. 
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SIJMMARY 

Observers' ratings as used In the original and follow-up studies are gross measurements 
of teaching behavior. The criteria scale ranges from a possible low score of 1.0 to 
a high of 7.0. In the two studies, beginning teachers tend to be rated lower than 
experienced teachers. The possible change In scores on the type of scale used In the 
Seattle Performance Appraisal Form makes It very unlikely that an Analysis of Variance 
for groups of ten Individuals will show statistically significant differences. In 
spite of the small differences noted, there appears to be a tendency for beginning 
teachers to show consistent and small gains from midway In the first year to the end 
of the second year of teaching. These teachers In this study. In general, appeared 
to show this tendency to Improve regardless of the original experiment treatment. 

That Is « the relative advantages of teachers receiving the original treatment of re- 
duced load aad/or Inservlce help was maintained . 

This conclusion Is difficult to prove statistically, for the reasons already stated* 

The conclusion depends somewhat on the interpretation of the fact that a small number 
of teachers were rated much differently from the original study and these differences 
from first year to second year were not typical for the total group. If the atypical 
cases are left out of the calculation the original differences among the four groups 
remain In about the same relationship after a period of one year. 
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TABLE 3 



DIFFERENCE IN MEAN OBSERVER RATINGS ON ROUNDS 4 AND 5 FOR TOTAL 
OF ALL TEN BEHAVIORS FOR EACH TEACHER BY GROUP 



O 
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Teacher 


Group 


Difference Between 
Rounds 4 and 5, 
Variable 15 


033 


1 


- .87 


094 


1 


1.11 


047 


1 


-1.29 


036 


1 


“ .12 


012 


1 


.48 


007 


1 


.20 


003 


1 


3.25 


058 


1 


.67 


099 


1 


.14 


085 


1 


1.72 


040 


2 


.42 


062 


2 


-1.39 


053 


2 


-4.18 


049 


2 


- .44 


041 


2 


- .28 


110 


2 


1.16 


104 


2 


- .21 


102 


2 


.04 


092 


2 


- .70 


079 


2 


2.15 


056 


3 


.07 


043 


3 


1.44 


032 


3 


1.20 


018 


3 


.36 


015 


3 


2.88 


013 


3 


.97 


004 


3 


- .41 


093 


3 


- .84 


090 


3 


-3.27 


068 


3 


1.92 


052 


4 


.99 


055 


4 


- .98 


030 


4 


- .80 


023 


4 


.23 


014 


4 


1.24 


Oil 


4 


4.81 


010 


4 


.12 


009 


4 


- .44 


006 


4 


.69 


108 


4 


3.60 
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DR. GARRISON'S REACTIONS TO PROFESSOR HITE'S FOLLCW-UP STUDY 

I have reviewed the Follow-Up Study, uong Term Effects of Modified Internship for 
Beginning Elementary Teachers . I can add little to Herb's discussion of the data. 
His analysis of the limitations of the study as a basis for conclusive proof of 
the hypothesis that the three experimental treatments In the "modified Internship" 
of 1965-66 do have \slgnlf leant long term effects extending Into the second year of 
teaching Is sound. 

In paragraph 1, page 3, of his discussion. Herb points out that the overall mean 
performance score of each of the four groups of ten In January 1966 was different 
from the original ranking of mean score of the total membership of each group at 
that time. It was Interesting to coiqiare the rankings of the four groups of ten 
each In January 1966 and In the spring of 1967. 

1966 Ranking— Mfean Overall Score 1967 Ranking— Mean Overall Score Dlfference 



Group II 
Group III 
Group I 
Group IV 



3.66 

3.03 

2.30 

2.13 



Group II 
Group III 
Group IV 
Group I 



3.32 

3.27 

3.08 

2.85 



- .34 
+ .24 
.95 
+ 35 



The two bottom groups In 1966 made the greatest comparative gain In the ensuing 
year. Yet Groups II and III maintained their top ranking, even though Group II 
dropped In mean performance -.34, and Groups IV and I made the greatest Junqp In 
mean performance, .95 and 35 respectively. As Herb points out. If the five 
atypical teachers, two In Group IV, one In Group I, one In Group II, and one In 
Group III are not considered, the cong>aratlve rankings would be about the same 
for 1966 and 1967, even though all groups were closer together. It would have 
helped In the follow-up study If each group of ten had been a representative sample 
of the January 1966 distribution of scores within each experimental group. The 
regression phenomena, the tendency of mean scores of subgroups to move In sub- 
sequent measures toward the mean of the total population may account In part for 
the above changes. ' 

The kind of Inquiry expressed by Dr. Hite's studies. In which your organisation 
has contributed some needed encouragement, should be continued. If we are to 
Identify and put Into practice those treatments which really facilitate the long 
term Individualised career growth of our faculties In the years ahead. This 
Inquiry has to be collaborative. Involving cooperative effort by at least these 
agencies In this state: 

(1) The Individual teacher. 

(2) The colleges which offer pre-service and graduate training. 

(3) The employing district. 

(4) The professional organisations In various teaching domains. 

This collaboration will take time and money. Predictably, the Investment will pay 
rich dividends In the returns (educational productivity) from our Investment In 
classroom leadership. Perhaps the SIRS program will play a key role In the 
collaboration. The TTT program and the recent State Department of Education studies 
of teacher certification Improvements are also promising. 
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As we continue the inquiry 1 have some suggestions which, if ioplemented, predictably 
will yield better results than as yet achieved. 

The first ! Improvement of our criteria, our statement of those specific 
teaching competences which define the general practitioner expectations 
of the teacher in classroom leadership and later additional tasks expected 
in various career specializations (example, the training of a teacher in 
a specific domain). 
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The second ! Improvement of tools and methods to carefully, systematically 
appraise criteria performance. For example, the performance appraisal 
guides used in the two Hite studies, although 1 feel they were very useful 
compared with other available measures (1 admit I am biased because 1 
designed these tools) have weaknesses. For any specific task we need to 
develop a scale which yields performance scores considerably more sensitive 
than a seven point spread, a range in scores from, say, 0 to 50, in which 
even the master teacher would rarely achieve above 40 points. We need to 
establish on such scales minimal qualifying scores for beginning certification, 
which leave a considerable range to recognize future growth in the skill. 

As we develop such scales, it is highly important that master teachers in 
various subject matter and grade level domains be active participants. 

! 

For each skill or criterion, we need to define much more clearly an adequate | 

sample of the teaching upon which to base an appraisal of the skill. In the ; 

Hite studies we used as an adequate sample one short observation of a lesson. 

As Dr. Hite has so well pointed out, this one lesson may not be typical. 

Perhaps four or five lessons would be barely adequate. For some skills, 
particularly those involving preparation and evaluative tasks, perhaps a 
conference with the teacher should be part of the appraised sample. 

The third ; Related to the above, we need to include in our appraisal data 
the self-appraisal of the individiial teacher, his otm self-perception of 
how well he demonstrates a specific criterion. It would have been 
interesting, indeed, to get such data in a repeat of the Hite studies, or 
even to use as performance scores data which an observer and the teacher 
develop together. The use in the Hite studies of experienced elementary 
teachers as observers and analyzers of teaching is, I believe, a con- 
structive development which should be continued with the collaboration of 
the professional associations of experienced specialists « I anticipate 
the possibility in the future that any teacher can call on a team of 
fellow specialists to assist him in making a careful analysis and 
diagnosis of his own practice in any given teaching situation. 

The fourth ; Long term, longitudinal studies, covering periods of three 
to five years of career development, are highly desirable. Only in this 
way can we get some answers to such important questions as explaining 
atypical cases suck as Dr. Hite described, and identifying causes of 
costly teacher turnover and drop out from the profession. In such studies 
we need to identify and explore the effect of many context factors, beyond 
the individual teacher's control, which have positive or negative effects 
on his teaching competences. Among the more important of such context 
factors are these: 

Differences in school buildings, facilities, technical obsolescence. 

Differences in district programs and investments in special services, 

guidance, Ibraries, instructional materials, teacher aide and 

clerical staff, inservice training, department leadership patterns. 
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Differences across building faculties in teachex load, dally preparation, 
faculty teaming cosBUinlcatlon systems. Innovative climate, vari* 
atlon in experlencefof staff, turnover. 

Differences In student groupings, soclo-cultural variations In the 
conmunlcatlon served by school buildings, parent- teacher, and teacher- 
comminlty agency conmunlcatlon patterns. 

Differences In teachers' outs Ide-of- school roles and responsibilities. 

Differences between teaching role expectations held by the teachers 
themselves and those held by parents and students served by the schools. 

Differences between teaching role expectations accepted by college 
faculties, academic or departments of education, and role expectations 
of the district and practicing specialists at various grade levels. 

Differences In salary schedules and compensation patterns for 
responsibilities, extra-curricular, or non- teaching roles assumed by 
individual teachers. 

Differences In the outslde-of-school conmunlcatlon experiences of 
learners which increasingly Influence their educational development 
inside the school, particularly variations In recreational and mass 
habits, unl<|ue soclo-cultural characteristics of their 
environment. 

One additional need, as we look to the future of continuous career development of 
our teachers, focuses on the importance of obtaining, preferably from Individual 
teachers. Information as to what action on their part was taken. If any, to use 
or apply an appraisal of a specific teaching coiq>etence In subsequent career 
development, whether it was an appraisal of the teacher's own performance or of 
another's. Such Information might help to Identify useful career development 
strategies, might help to explain atypical cases of career growth. 

The above verbiage has probably been more confusing than helpful. I do belike we 
are making definite progress in this state toward solving a very complex problem 
of nation-wide significance. In spite of the lack of conclusive evidence. Dr. 
Hite's studies point in the right direction. Hopefully we can continue this 
exploration. 




